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Abstract — We propose randomized frameproof codes for con- 
tent protection, which arise by studying a variation of the Boneh- 
Shaw fingerprinting problem. In the modified system, whenever 
a user tries to access his fingerprinted copy, the fingerprint is 
submitted to a validation algorithm to verify that it is indeed 
permissible before the content can be executed. We show an 
improvement in the achievable rates compared to deterministic 
frameproof codes and traditional fingerprinting codes. 

For coalitions of an arbitrary fixed size, we construct random- 
ized frameproof codes which have an 0(n 2 ) complexity validation 
algorithm and probability of error exp(— f2(n)), where n denotes 
the length of the fingerprints. Finally, we present a connection 
between linear frameproof codes and minimal vectors for size-2 
coalitions. 

I. Introduction 

The availability of content (e.g., software, movies, music 
etc.) in the digital format, although with many advantages, 
has the downside that it is now easy for users to make copies, 
perform alterations, and share the content illegally. Thus there 
is a dire need for protecting the content against unauthorized 
redistribution, commonly termed as piracy. 

In this paper, we consider a variation of the Boneh-Shaw 
fingerprinting scheme [6] for content protection. We start with 
an informal description of the problem. We will refer to the 
legal content owner as the distributor and the legitimate license 
holders as users. The distributor embeds a unique hidden mark, 
called a fingerprint, which identifies each licensed copy. The 
fingerprint locations, however, remain the same for all users. 
The collection of fingerprints is called the codebook and the 
distributor uses some form of randomization in choosing the 
codebook. We assume that changes to the actual content render 
it useless, while the fingerprint may be subject to alterations. 
This assumption is reasonable, for instance, in applications to 
software fingerprinting. 

A single user is unable to pinpoint any of the fingerprint lo- 
cations. However, if a set of users, called a coalition of pirates, 
compare their copies, they can infer some of the fingerprint 
locations by identifying the differences. The coalition now 
attempts to create a pirated copy with a forged fingerprint. 
In order to define the coalition's capability in creating the 
forgery, Boneh and Shaw introduced the marking assumption, 
which simply states that the coalition makes changes only in 
those positions where they find a difference (and hence are 



definitely fingerprint locations) as they do not wish to damage 
the content permanently. 

The objective of the distributor is to trace one of the guilty 
users whenever such a pirated copy is found. The maximum 
coalition size is a parameter of the problem. Such a collection 
of fingerprints together with the tracing algorithm is called a 
fingerprinting code. This problem has been studied in detail 
in [6], [4], [11], [2], where various constructions and upper 
bounds have been presented. 

Consider now the modified system where each time a user 
accesses his fingerprinted copy, the fingerprint is validated 
to verify whether it is in fact permissible in the codebook 
being used and the execution continues only if the validation is 
successful. This limits the forgery possibilities for the pirates at 
the cost of an additional validation operation carried out every 
time a user accesses his copy. The idea is that by designing an 
efficient validation algorithm, we do not pay too high a price. 

The advantage of this scheme is demonstrated by an im- 
provement in the achievable rates compared to traditional 
fingerprinting codes, even though the actual property (cf. 
Definition 12. 2\ is not in general weaker than fingerprinting. In 
addition, since the pirates are limited to creating only a valid 
fingerprint and because we are interested in unique decoding, 
there is no additional tracing needed. The distributor simply 
accuses the user corresponding to the fingerprint in the pirated 
copy as guilty. 

In this case, the coalition is successful if it is able to forge 
the fingerprint of an innocent user, thus "framing" him as the 
pirate. The distributor's objective is to design codes for which 
the probability that this error event occurs is small, deriving 
the name frameproof codes. 

In the deterministic case with zero-error probability, frame- 
proof codes arise as a special case of separating codes, which 
have been studied over many years since being introduced in 
[8]. For further references on deterministic frameproof codes 
and separating codes, we refer the interested reader to [9], 
[7], [10], [5]. In order to emphasize the difference that we 
consider the randomized setting, we call our codes randomized 
frameproof codes. 

The rest of the paper is organized as follows. In Section 
HU we give a formal definition for randomized frameproof 
codes. Achievable rates under no restrictions on validation 



complexity are presented in Section [TTT] In Section |IV| we 
show the existence of linear frameproof codes and exhibit a 
connection to minimal vectors for size-2 coalitions. Finally, 
we design a concatenated code with efficient validation for 
arbitrary coalition sizes in Section [V] 

II. Problem definition 

We will use the following notation. Boldface will denote 
vectors. The Hamming distance between vectors Xi,x 2 will 
be denoted by dist(cci, x^)- We also write s z (x\, . . . ,x t ) to 
denote the number of z T columns in the matrix formed with 
the vectors x%,...,xt as the rows. For a positive integer n, 
the shorthand notation [n] will stand for the set {1, . . . ,n}. 
We use h(p) := — p\og 2 p— (1— p) log 2 (l —p) to denote the 
binary entropy function and D(p\\q) :— p\og 2 (p/q) + (1 — 
p) log 2 ((l —p)/(l — q)) to denote the information divergence. 

Let Q be an alphabet (often a field) of finite size q and let 
M be the number of users in the system. Assume that there is 
some ordering of the users and denote their set by [M]. The 
fingerprint for each user is of length n. 

Consider the following random experiment. We have a 
family of g-ary codes {Ck,k € K,} of length n and size 
M. In particular, here the code Ck refers to an ordered 
set of M codewords. We pick one of the codes according 
to the probability distribution function (ir{k),k G fC). For 
brevity, the result of this random experiment is called a 
randomized code and is denoted by C. The rate of this code 
is R = n -1 log g M. We will refer to elements of the set K, as 
keys. Note that the dependence on n has been suppressed for 
simplicity. 

The distributor assigns the fingerprints as follows. He 
chooses one of the keys, say k, with probability n(k), and 
assigns to user i the ith codeword of Ck, denoted by Ck(i). 
Following the standard cryptographic precept that the adver- 
sary knows the system, we allow the users to be aware of the 
family of codes {Ck} and the distribution 7r(-), but the exact 
key choice is kept secret by the distributor. 

The fingerprints are assumed to be distributed within the 
host message in some fixed locations unknown to the users. 
Before a user executes his copy, his fingerprint is submitted to 
a validation algorithm, which checks whether the fingerprint 
is a valid codeword in the current codebook. The execution 
continues only if the validation succeeds. 

A coalition U of t users is an arbitrary i-subset of [M] . The 
members of the coalition are commonly referred to as pirates. 
Suppose the collection of fingerprints assigned to U, namely 
Ck(U), is {xi, . . . ,x t }. The goal of the pirates is to create 
a forged fingerprint different from theirs which is valid under 
the current key choice. 

Coordinate i of the fingerprints is called undetectable for the 
coalition U if xu — x 2 i = • ■ ■ = xu and is called detectable 
otherwise. We assume that the coalition follows the marking 
assumption [6] in creating the forgery. 

Definition 2.1: The marking assumption states that for any 
fingerprint y created by the coalition U, yi = Xu = x 2i — 
■ ■ ■ = xu in every coordinate i that is undetectable. 



In other words, in creating y, the pirates can modify only 
detectable positions. 

For a given set of observed fingerprints {xi, . . . , x t }, the 
set of forgeries that can be created by the coalition is called the 
envelope. Its definition depends on the exact rule the coalition 
should follow to modify the detectable positions [4]: 

• If the coalition is restricted to use only a symbol from 
their assigned fingerprints in the detectable positions, we 
obtain the narrow-sense envelope: 

e(xi, . . . ,x t ) = {y E Q n \yi e {xu, . . .,x ti },Vi e [n]}; 

(1) 

• If the coalition can use any symbol from the alphabet 
in the detectable positions, we obtain the wide-sense 
envelope: 

E(xi, ...,x t ) = {y€ Q n \yi = xu,\/i undetectable}. 

(2) 

For the binary alphabet, both envelopes are exactly the 
same. In the following, we will use £(•) to denote the envelope 
from any of the rules mentioned above. 

Definition 2.2: A randomized code C is said to be t- 
frameproof with e-error if for all U C [M] such that \U\ < t, 
it holds that 

Pr{£(C(U)) n (C\C(U)) ^ 0} < e, (3) 

where the probability is taken over the distribution ir(-). 

Remark 2.3: Note that the t-frameproof property as defined 
above is not in general weaker than the i-fingerprinting 
property, i.e., a code which is t-fingerprinting with e-error 
[6, Definition IV.2] is not automatically t-frameproof with e'- 
error, for any < e' < 1. 

A straightforward extension of the fingerprinting definition 
yields a randomized code which satisfies the following condi- 
tion: For any coalition of size at most t and any strategy they 
may use in devising a forgery, the probability that the forgery 
is valid is small. However, this definition would trivially enable 
us to achieve arbitrarily high rates. Hence, we use the above 
(stronger) definition. 

III. Lower bounds for binary frameproof codes 

Let us construct a binary randomized code C of length n 
and size M = 2 nR as follows. We pick each entry in the 
M x n matrix independently to be 1 with probability p, for 
some < p < 1. 

Theorem 3.1: The randomized code C is ^-frameproof with 
error probability decaying exponentially in n for any rate 

i?<Vlog 2 p-(l-p) t log 2 (l-p)- (4) 
Proof: For 7 > 0, define the set of ^-tuples of vectors 

where I y := [n(p l —^),n{p t +7)] and J 7 := [n((l — p) 1 — 
7), n((l — pY + 7)]. It is clear that for any coalition U of size 
t, the observed fingerprints {x\, . . . , x t ) belong to 7^ 7 with 



high probability^. Hence, we will refer to T t . 7 as the set of 
typical fingerprints. For any coalition U of size t 

Pt{S(C(U)) n (C\C(U)) + 0} 
< Pv{C(U) i T t ^} 

+ Pr{3y G C\C{U) : y G £(C(U))\C(U) G T t , 7 }. (5) 

The first term in the above equation decays exponentially in 
n. It is left to prove that the second term is also exponentially 
decaying for R satisfying (O. 

A codeword in C\C(U) is a part of £(C(U)) if it contains a 
1 (resp. 0) in all sx(C(U)) (resp. sq(C(U))) positions. Since 
C(U) £ 7^ j7 , by taking a union bound the second term in (O 
is at most 

2 nR p n( -P t -i) (i _ p) n (( 1 -p)'-7) j 

which decays exponentially in n for 

R < -(p*-7)log a p- ((1 -pY -7)log 2 (l-p). 

The proof is completed by taking 7 to be arbitrarily small. ■ 
The bias p in the construction of C can be chosen optimally 
for each value of t. Numerical values of the rate thus obtained 
are shown in Table [2 where they are compared with the 
existence bounds for deterministic zero-error frameproof codes 
(from [7]) and rates of fingerprinting codes (from [2], [1]). 
Observe that there is a factor of t improvement compared to 
the rate of deterministic frameproof codes. 



TABLE I 
Comparison of rates 





Rates 


t 


Randomized 


Deterministic 


Fingerprinting 




Frameproof 


Frameproof 




2 


0.5 


0.2075 


0.25 


3 


0.25 


0.0693 


0.0833 


4 


0.1392 


0.04 


0.0158 


5 


0.1066 


0.026 


0.0006 



IV. Linear frameproof codes 

Unlike fingerprinting codes, randomized frameproof codes 
eliminate the need for a tracing algorithm, but the fingerprints 
still need to be validated. As the validation algorithm is 
executed everytime a user accesses his copy, we require 
that this algorithm have an efficient running time. Although 
the codes designed in the previous section have high rates, 
they come at the price of an exp(n) complexity validation 
algorithm. Linear codes are an obvious first choice in trying 
to design efficient frameproof codes as they can be validated 
in 0(n 2 ) time by simply verifying the parity-check equations. 

'We say that an event occurs with high probability if the probability that 
it fails is at most exp(— cn), where c is a positive constant. 



A. Linear construction for t = 2 

We now present a binary linear frameproof code for t = 2 
which achieves the rate given by Theorem 13.11 Suppose we 
have M = 2 nR users. We construct a randomized linear code 
C as follows. Pick a random n(l — R) xn parity-check matrix 
with each entry chosen independently to be or 1 with equal 
probability. The corresponding set of binary vectors which 
satisfy the parity-check matrix form a linear code of size 2 nR 
with high probability. Each user is then assigned a unique 
codeword selected uniformly at random from this collection. In 
the few cases that the code size exceeds 2 nR , we simply ignore 
the remaining codewords during the assignment. However, 
note that since the validation algorithm simply verifies the 
parity-check equations, it will pronounce the ignored vectors 
also as valid. 

Theorem 4.1: The randomized linear code C is 2- 
frameproof with error probability decaying exponentially in 
n for any rate R < 0.5. 

Proof: As in the proof of Theorem 13.11 we begin by 
defining the set of typical pairs of fingerprints. For 7 > 0, 
define 

T 7 := |(a3i,x 2 ) : s lj (x ll x 2 ) £ £ {0,1}|, 

where Z 7 := [n( 1 /4 — 7), n( x /4 + 7)]. For any coalition U of 
two users 

Pt{£(C(u)) n (c\c(u)) + 0} 

<Pr{C(Cf) ^T 7 }+ Pr{C{U) = {xx,x 2 )} 

x Pr{3y G C : y 6 £{x u x 2 )\{x ll x 2 \\C{U) = (sbi,x 2 )}. 

It can be seen that the first term again decays exponentially 
in n. We now consider the term inside the summation 

Pr{3y e C :y G £{x 1 ,x 2 )\{x 1 ,x 2 }\C{U) = (x 1 ,x 2 )}. 

Observe that for any two binary vectors (xi,x 2 ) G Zy, the 
sum xi +x 2 $l £(xi,x 2 ) and also ^ £(xi,x 2 ). Therefore, 
every vector in £(xi,x 2 )\{xi,x 2 } is linearly independent 
from xi,x 2 . Thus for any y G £(xi,x 2 )\{xi,x 2 }, 

Pr{y G C\C(U) - (x llX2 )} = Pr{y G C} = 2^-^. 

Since (x 1 ,x 2 ) G T 7 , \£(x 1 ,x 2 )\ < 2«( 1 A+27). By taking the 
union bound and 7 to be arbitrarily small, we obtain the result. 

■ 

B. Connection to minimal vectors 

In this subsection, we show a connection between linear 
2-frameproof codes and minimal vectors. We first recall the 
definition for minimal vectors (see, for e.g., [3]). Let C be 
a g-ary [n, k] linear code. The support of a vector c G C is 
given by supp(c) = {i G [n] : c L ^ 0}. We write c' ^< c if 
supp(c') C supp(c). 

Definition 4.2: A nonzero vector c G C is called minimal 
if c' ^< c implies c' = ac, where c! is another code 
vector and a is a nonzero constant. 



Proposition 4.3: For any x-y, x 2 G C, a?i 7^ #2, if ^2 — ^1 
is minimal then e(a;i, X2) H (C\{a;i, ^2}) = 0. If g = 2, the 
converse is also true. 

Proof: Consider any y £ <2™ and define the translate 
y' := y — x%. It follows that 

yi{x 1 ,x 2 ] y'£{Q,x 2 



Xl}. 



(6) 
(7) 



y £ e(x 1 ,x 2 ) 



(8) 



Furthermore, if G {a;ii,a;2i}, then y[ G {0,X2i — xi^} for 
all i G [n]. Therefore, 

y' < x 2 - xi, 

y' ^a{x 2 -x x )^ai {0,1} 

Using ©, ©, ©, we obtain that e(x\, x 2 )f)(C\{xi, x 2 }) ^ 
implies that x 2 — X\ is non-minimal. 

For q = 2, it is easily seen that the reverse statement also 
holds in (0 and thus the converse is also true. ■ 

Recall the random linear code constructed by generating 
a random n(l — R) x n parity-check matrix in the previous 
subsection. With some abuse of notation, let us denote the 
(unordered) set of vectors satisfying the random parity-check 
matrix also by C. Let M.(C) denote the set of minimal vectors 
in C. We have the following companion result to Corollary 2.5 
in [3]. 

Corollary 4.4: As in 00, 
\M(C)\ 



1, R < 1/2 
0, R > 1/2 

Proof: As a consequence of Proposition 14. 31 for any two 
users {u\,u 2 }, we obtain 



= 1 



Pr{5(C(«i,« a )) n (C\C( Ul , u 2 )) ± 0} 
= Pv{C(u 2 )-C(u 1 ) £M{C)} 
~ \M{C)\ - 

.\c\-i_ 

The first part of the result is now true by Theorem 14. 1 1 We 
skip the details of the latter part which is easily proved using 
Chernoff bounds. ■ 

C. Linear codes for larger t 

In the light of Theorem 14.11 a natural question to ask is 
whether there exist randomized linear frameproof codes for 
t > 2, perhaps allowing even a larger alphabet. It turns out 
that, just as in the deterministic case, linear frameproof codes 
do not always exist in the randomized setting too. 

Proposition 4.5: There do not exist g-ary linear t- 
frameproof codes with e-error, < e < 1, which are secure 
with the wide-sense envelope if either t > q or q > 2. 

Proof: Consider a coalition of q+ 1 users. For any linear 
code realized from the family where the observed fingerprints 
are, say, xi, . . . , x q+1 , the forgery y = x x + ■ ■ ■ + x q+1 
is a part of E(x\, . . . ,x q +\). In addition, it is also a valid 
fingerprint as the code is linear. This proves the first part of 
the proposition. 

To prove the second part, consider an alphabet (a field) with 
q > 2. For any two pirates with fingerprints X\ and x 2 , the 



forgery y — ax\ + (1 — a)x 2 , where a / 0,1, is a valid 
codeword (by linearity) and is also a part of the wide-sense 
envelope. ■ 
Consequently, in considering linear frameproof codes which 
are wide-sense secure, we are limited to t — 2, q = 2. 

V. Polynomial-time validation for larger t 

Usually, the amount of redundancy needed increases with 
the alphabet size in fingerprinting applications. Thus, we are 
mainly interested in constructing binary frameproof codes 
which have polynomial-time validation. With the binary alpha- 
bet, there is no distinction between wide-sense and narrow- 
sense envelopes. Therefore, there do not exist binary linear 
frameproof codes for t > 2 by Proposition [43] In this section, 
we use the idea of code concatenation to construct a binary 
frameproof code with polynomial-time validation. 

In the case of deterministic codes, if both the inner and 
outer codes are ^-frameproof ((i, 1) -separating) with zero- 
error, then the concatenated code is also t-frameproof. We 
will now establish a similar result when the inner code is a 
randomized t-frameproof code. 

Let the outer code C out be a (deterministic) g-ary linear 
[N, K, A] code. For each of the N coordinates of the outer 
code, generate an independent instance of a randomized binary 
code Cj n of length m and size q which is t-frameproof with 
e-error. Then the concatenated code C with outer code C out 
and inner code independent instances of C m is a randomized 
binary code of length n — Nm and size q K . 

Theorem 5.1: If the relative minimum distance of C out 
satisfies 

— >l--(l-f) (9) 
N ~ t V W ' 

and the error probability e < £ for C; n , then the concatenated 
code C is i-frameproof with error probability 2~ ND te\W and 
has a poly(n) validation algorithm. 

Proof: In the proof, all vectors are q-ary corresponding 
to the outer alphabet. Define 



s{y 1 {x 1 ,...,x t }) := \{i G [N] : y. t G {xi 

d(y, {xi, x t }) := mindist(y, xi). 

ie[t] 



,Xti}}\, 



Consider a coalition U C {l,...,q K } of size t. For any 
coordinate i G [N] of the outer code, the coalition observes at 
most t different symbols of the outer alphabet, i.e., at most t 
different codewords of the inner code. Thus if the t-frameproof 
property holds for the observed symbols for the realization 
of Cj n at coordinate i, then at the outer level the coalition 
is restricted to output one of the symbols it observes, i.e., the 
narrow-sense rule (fl~|l holds. On the other hand, a failure of the 
t-frameproof property at the inner level code implies that the 
coalition is able to create a symbol different from what they 
observe in the corresponding coordinate at the outer level. 

Accordingly, let Xii^ = L- -!^ denote the indicator 
random variables (r.v.s) for failures at the inner level with 
P r {Xi = 1} < £ since the inner code has e-error. Note 
that Xi are independent because we have an independent 



instance of the randomized code for every i = 1, . . . , N. 
Then Z = 2~2i=i Xi i s a Binomial r.v. denoting the number 
of coordinates where the narrow-sense rule fails at the outer 
level. For < z < N, let e z (-) denote the envelope when the 
narrow-sense rule is followed only in some N — z outer-level 
coordinates, i.e., 

e z (x 1 , ...,x t ) = {y: s(y, {x lt x t }) > N - z}. 

For any y G e z (xi, . . . ,x t ), there exists some I G {1, . . . , t} 
such that s(y, x t ) > (N - z) /t, i.e., dist(y, x t ) < N - (N - 
z)/t. Therefore, 



(xi, . . . , x t ) C \ y : d(y, {x 1 ,..., x t }) < N 



N ■ 



t 

(10) 

The coalition U succeeds when it creates a forgery which is 
valid in the outer code. Thus the probability of error is at most 



Pr{3y G C oat \C oat (U) : V G e z (C 0Ut (U))} 
< Pr (ay G C out \C out (U) : d(y, C out {U)) < N 



N 



N - Z 



<Pr{Z > N£} 

< 2 -wfl(eiie) ) 



(11) 

(12) 

(13) 
(14) 



where (fTTb follows from ( fTUl ), ( TT2l is because C out is a linear 
code with minimum distance A, dot is due to the condition 
(O and < fl~4T > is obtained by standard large deviation bounds. 

The validation algorithm operates in two steps. In the first 
step, the inner code is decoded/validated for every outer code 
coordinate by exhaustive search over q codewords. We then 
check whether the resulting q-ary vector is a member of the 
outer code by verifying the parity-check equations. The claim 
about the polynomial-time complexity is true by choosing an 
appropriate scaling for the inner code length, for instance, m ~ 
log 2 (n). ■ 

We now make specific choices for the outer and inner codes 
in Theorem 15. II to arrive at explicit constructions. We take Cj n 
to be the binary randomized i-frameproof code presented in 
Theorem [XT] and with growing length. Thus we have the inner 
code rate as 

R t = max [-p* log 2 p - (1 - p)' log 2 (1 - p)] 

P e[o,i] 

and error probability e = 2~ m * for some (3 > 0. The outer 
code C out is a [q — 1, K] Reed-Solomon (RS) code with rate 
at most (1 — £)/t in order to satisfy the condition © on the 
minimum distance. Observe that for e approaching (for large 
m) and £ fixed, £)(£||e) ~ £log 2 (l/e). Therefore, with e = 
2~ m P, the error probability of the concatenated code is at most 
2-"(£/3+°( 1 )) By taking £ arbitrarily small and m sufficiently 
large to satisfy e < £, we obtain the following result. 

Corollary 5.2: The binary randomized code obtained by 
concatenating C out and C m is t-frameproof with error prob- 



ability exp(— validation complexity 0(n 2 ) and rate 
arbitrarily close to Rt/t. 

VI. Conclusion 

The question of upper bounds on the rate of randomized 
frameproof codes is open. 
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